Concurrent Collection, reporting custom progress data to UI when parallel tasking - user-interface

I have a concurrent collection that contains 100K items. The processing of each item in the collection can take as little as 100ms or as long as 10 seconds. I want to speed things up by parallelizing the processing, and have a 100 minions doing the work simultaneously. I also have to report some specific data to the UI as this processing occurs, not simply a percentage complete.
I want the parallelized sub-tasks to nibble away at the concurrent collection like a school of minnows attacking a piece of bread tossed into a pond. How do I expose the concurrent collection to the parallelized tasks? Can I have a normal loop and simply launch an async task inside the loop and pass it an IProgress? Do I even need the concurrent collection for this?
It has been recommended to me that I use Parallel.ForEach but I don't see how each sub-process established by the degrees of parallelism could report a custom object back to the UI with each item it processes, not only after it has finished processing its share of the 100K items.

The framework already provides the IProgress inteface for this purpose, and an implementation in Progress. To report progress, call IProgress.Report with a progressvalue. The value T can be any type, not just a number.
Each IProgress implementation can work in its own way. Progress raises an event and calls a callback you pass to it when you create it.
Additionally, Progress.Report executes asynchronously. Under the covers, it uses SychronizationContext.Post to execute its callback and all event handlers on the thread that created the Progress instance.
Assuming you create a progress value class like this:
class ProgressValue
{
public long Step{get;set;}
public string Message {get;set;}
}
You could write something like this:
IProgress<ProgressValue> myProgress=new Progress<ProgressValue>(p=>
{
myProgressBar.Value=p.Step;
});
IList<int> myVeryLargeList=...;
Parallel.ForEach(myVeryLargeList,item,state,step=>
{
//Do some heavy work
myProgress.Report(new ProgressValue
{
Step=step,
Message=String.Format("Processed step {0}",step);
});
});
EDIT
Oops! Progress implements IProgress explicitly. You have to cast it to IProgress , as #Tim noticed.
Fixed the code to explicitly declare myProgress as an IProgress.

Related

TPL Dataflow block with an inner life

I've been doing a bit of TPL dataflow coding and am quite happy with the basics. The question I have is, how would I go about doing a TPL block that, besides reacting to its queue, also has a life on its own?
Like, a background task that runs on a permanent loop, polling the odd webservice or database every few seconds and emitting messages on its outputs when it sees fit?
The interface would be a source- and target block, but with no apparent connection between source messages and target messages.
Basically an "active" block.
Blocks aren't background tasks or workers. They don't have a lifetime of their own outside the pipeline and their messages. That doesn't mean you can't do what you ask though. You'll have to somehow trigger the head block from the outside though.
One way to do what you want is to create a timer that "pings" the head block of a pipeline:
var head=new TransformBlock<int,...>(...);
...
var timer=new System.Threading.Timer(_=>head.PostAsync(0),null,0,5000);
You can start, stop or change the timer as needed.
You could use this eg to process the files in a folder periodically :
var crawler=new TransformManyBlock<string,string>(root=>{
return Directory.EnumerateFiles(root,"*.csv");
});
var parseCsv=new TransformBlock<string,Record[]>(filePath=>{
var records=await parseCsvAsync(filePath);
return records;
}
...
var timer=new System.Threading.Timer(_=>head.PostAsync(rootFolder),null,0,5000);
So, here's what I did in the end:
My FSM class consisted of a broadcast block, working as input, a bufferblock working as output and an async method with an endless loop running in the threadpool. The async method gets the current input value from the broadcastblock whenever needed, does its thing and posts a result to the bufferblock, in my case every second.
A user of that FSM class can link to the broadcast block's input and the bufferblock's output. The async method polls the input when needed and gets new values. Whatever it does, the result goes to the buffer block.
The interface then looks like
public interface IFsmBlock : IAsyncDisposable
{
ITargetBlock<string?> SearchParameterInputPort { get; }
ISourceBlock<string> FoundEntriesOutputPort { get; }
}
With possibly more than one input or output block. In this instance, the FSM treats the input port basically as a variable containing a value. If the input port should trigger something, that's easily possible by using an action block instead of the buffer block.

Are hot non completing database observables a Rx usecase? Side-effect writing issue

I have more of a opinions question, asi if this, what many people do, should be a Rx use case.
In apps there is usually sql database, which is queried by UI as a observable, which emits after the query is loaded + anytime data changes (Room / SqlDelight etc)
Reads sound okay, however, is it possible to have "pure" writes to the database?
Writing to the database might look like this
fun sync() = Completable.fromCallable {
// do something
database.writeSomethingSynchronously()
}
SomeUi {
init {
database.someQueryObservable()
.subscribe { show list }
}
}
Imagine you want to display progressbar while this Completable is in flight.
What is effectively happening here is sideffecting to the database. Which means the opened database observable will re-emit when the data is written, but still before the sync() returns (assuming single threaded for simplicity)
Now there is point in time where there is new data in the UI and the progressbar is shown. (and worse with multithreading timings) This is invalid state.
In imperative world, sync would provide a completion callback, in which one would reload the query manually + show/hide progressbar synchronously. (And somehow block the database change listener for duration of the sync writes?)
Is there a way around this at all?

How can I have multiple contexts handle events in Apama

I am trying to define a monitor in which I receieve events and then handle them on multiple contexts (roughly equating to threads if I understand correctly) I know I can write
spawn myAction() to myNewContext;
and this will run that action in the new context.
However I want to have an action which will respond to an event when it comes into my monitor:
on all trigger() as t {
doMyThing()
}
on all otherTrigger() as ot {
doMyOtherThing()
}
Can I define my on all in a way that uses a specific context? Something like
on all trigger() as t in myContext {
doMyThing()
}
on all otherTrigger() as t in myOtherContext {
doMyOtherThing()
}
If not what is the best way to define this in Apama EPL? Also could I have multiple contexts handling the same events when they arrive, round robin style?
Apama events from external receivers (ie the outside world) are delivered only to public contexts, including the 'main' context. So depending on your architecture, you can either spawn your action to a public context
// set the receivesInput parameter to true to make this context public
spawn myAction() to context("myContext", true);
...
action myAction() {
on all trigger() as t {
doMyThing();
}
}
or, spawn your action to a private context and set up an event forwarder in a public context, usually the main context (which will always exist)
spawn myAction() to context("myNewContext");
on all trigger() as t {
send t to "myChannel"; // forward all trigger events to the "myChannel" channel
}
...
action myAction() {
monitor.subscribe("myChannel"); // receive all events delivered to the "myChannel" channel
on all trigger() as t {
doMyThing();
}
}
Spawning to a private context and leveraging the channels system is generally the better design as it only sends events to contexts that care about them
To extend a bit on Madden's answer (I don't have enough rep to comment yet), the private context and forwarders is also the only way to achieve true round-robin: otherwise all contexts will receive all events. The easiest approach is to use a partitioning strategy (e.g. IDs ending in 0 go to context-0, or you have one context per machine you're monitoring, etc.), because then each concern is tracked in the same context and you don't have to share state.
Also could I have multiple contexts handling the same events when they arrive, round robin style?
This isn't entirely clear to me. What benefit are you aiming for here? If you're looking to reduce latency by having the "next available" context pick up the event, this probably isn't the right way to achieve it - the deciding which context processes the event means you'd need inter-context communications and coordination, which will increase latency. If you want multiple contexts to process the same events (e.g. one context runs your temperature spike rule, and another runs your long-term temperature average rule, but both take temperature readings as inputs), then that's a good approach but it's not what I'd have called round-robin.

What is considered overloading the main thread?

I am displaying information from a data model on a user interface. My current approach to doing so is by means of delegation as follows:
#protocol DataModelDelegate <NSObject>
- (void)updateUIFromDataModel;
#end
I am implementing the delegate method in my controller class as follows, using GCD to push the UI updating to the main thread:
- (void)updateUIFromDataModel {
dispatch_async(dispatch_get_main_queue(), ^{
// Code to update various UI controllers
// ...
// ...
});
}
What I am concerned about is that in some situations, this method can be called very frequently (~1000 times per second, each updating multiple UI objects), which to me feels very much like I am 'spamming' the main thread with commands.
Is this too much to be sending to the main thread? If so does anyone have any ideas on what would be the best way of approaching this?
I have looked into dispatch_apply, but that appears to be more useful when coalescing data, which is not what I am after - I really just want to skip updates if they are too frequent so only a sane amount of updates are sent to the main thread!
I was considering taking a different approach and implementing a timer instead to constantly poll the data, say every 10 ms, however since the data updating tends to be sporadic I feel that it would be wasteful to do so.
Combining both approaches, another option I have considered would be to wait for an update message and respond by setting the timer to poll the data at a set interval, and then disabling the timer if the data appears to have stopped changing. But would this be over-complicating the issue, and would the sane approach be to simply have a constant timer running?
edit: Added an answer below showing the adaptations using a dispatch source
One option is to use a Dispatch Source with type DISPATCH_SOURCE_TYPE_DATA_OR which lets you post events repeatedly and have libdispatch combine them together for you. When you have something to post, you use dispatch_source_merge_data to let it know there's something new to do. Multiple calls to dispatch_source_merge_data will be coalesced together if the target queue (in your case, the main queue) is busy.
I have been experimenting with dispatch sources and got it working as expected now - Here is how I have adapted my class implementation in case it is of use to anyone who comes across this question:
#implementation AppController {
#private
dispatch_source_t _gcdUpdateUI;
}
- (void)awakeFromNib {
// Added the following code to set up the dispatch source event handler:
_gcdUpdateUI = dispatch_source_create(DISPATCH_SOURCE_TYPE_DATA_ADD, 0, 0,
dispatch_get_main_queue());
dispatch_source_set_event_handler(_gcdUpdateUI, ^{
// For each UI element I want to update, pull data from model object:
// For testing purposes - print out a notification:
printf("Data Received. Messages Passed: %ld\n",
dispatch_source_get_data(_gcdUpdateUI));
});
dispatch_resume(_gcdUpdateUI);
}
And now in the delegate method I have removed the call to dispatch_async, and replaced it with the following:
- (void)updateUIFromDataModel {
dispatch_source_merge_data(_gcdUpdateUI, 1);
}
This is working absolutely fine for me. Now Even during the most intense data updating the UI stays perfectly responsive.
Although the printf() output was a very crude way of checking if the coalescing is working, a quick scrolling back up the console output showed me that the majority of the messages print outs had a value 1 (easily 98% of them), however there were the intermittent jumps to around 10-20, reaching a peak value of just over 100 coalesced messages around a time when the model was sending the most update messages.
Thanks again for the help!
If the app beach-balls under heavy load, then you've blocked the main thread for too long and you need to implement a coalescing strategy for UI updates. If the app remains responsive to clicks, and doesn't beach-ball, then you're fine.

Can someone explain callback/event firing

In a previous SO question it was recommended to me to use callback/event firing instead of polling. Can someone explain this in a little more detail, perhaps with references to online tutorials that show how this can be done for Java based web apps.
Thanks.
The definition of a callback from Wikipedia is:
In computer programming, a callback is
executable code that is passed as an
argument to other code. It allows a
lower-level software layer to call a
subroutine (or function) defined in a
higher-level layer.
In it's very basic form a callback could be used like this (pseudocode):
void function Foo()
{
MessageBox.Show("Operation Complete");
}
void function Bar(Method myCallback)
{
//Perform some operation
//When completed execute the callback method
myCallBack().Invoke();
}
static int Main()
{
Bar(Foo); //Pops a message box when Bar is completed
}
Modern languages like Java and c# have a standardized way of doing this and they call it events. An event is simply a special type of property added to a class that contains a list of Delegates / Method Pointers / Callbacks (all three of these things are the same thing. When the event gets "fired" it simply iterates through it's list of callbacks and executes them. These are also referred to as listeners.
Here's an example
public class Button
{
public event Clicked;
void override OnMouseUp()
{
//User has clicked on the button. Let's notify anyone listening to this event.
Clicked(); //Iterates through all the callbacks in it's list and calls Invoke();
}
}
public class MyForm
{
private _Button;
public Constructor()
{
_Button = new Button();
//Different languages provide different ways of registering listeners to events.
// _Button.Clicked += Button_Clicked_Handler;
// _Button.Clicked.AddListener(Button_Clicked_Handler);
}
public void Button_Clicked_Handler()
{
MessageBox.Show("Button Was Clicked");
}
}
In this example the Button class has an event called Clicked. It allows anyone who wants to be notified when is clicked to register a callback method. In this case the "Button_Clicked_Handler" method would be executed by Clicked event.
Eventing/Callback architecture is very handy whenever you need to be notified that something has occurred elsewhere in the program and you have no direct knowledge of when or how this happens.
This greatly simplifies notification. Polling makes it much more difficult because you are responsible for checking every so often whether or not an operation has completed. A simple polling mechanism would be like this:
static void CheckIfDone()
{
while(!Button.IsClicked)
{
//Sleep
}
//Button has been clicked.
}
The problem is that this particular situation would block your existing thread and have to continue checking until Button.IsClicked is true. The nice thing about eventing architecture is that it is asynchronous and let's the Acting Item (button) notify the listener when it is completed instead of the listener having to keep checking,
The difference between polling and callback/event is simple:
Polling: You are asking, continuously or every fixed amount of time, if some condition is meet, for example, if some keyboard key have been pressed.
Callback: You say to some driver, other code or whatever: When something happens (the keyboard have been pressed in our example), call this function, and you pass it what function you want to be called when the event happens. This way, you can "forget" about that event, knowing that it will be handled correctly when it happens.
Callback is when you pass a function/object to be called/notified when something it cares about happens. This is used a lot in UI - A function is passed to a button that is called whenever the button is pressed, for example.
There are two players involved in this scenario. First you have the "observed" which from time to time does things in which other players are interested. These other players are called "observers". The "observed" could be a timer, the "observers" could be tasks, interested in alarm events.
This "pattern" is described in the book "Design Patterns, Elements of Reusable Object-Oriented Software" by Gamma, Helm, Johnson and Vlissides.
Two examples:
The SAX parser to parse XML walks
trough an XML file and raises events
each time an element is encountered.
A listener can listen to these
elements and do something with it.
Swing and AWT are based on this
pattern. When the user moves the
mouse, clicks or types something on
the keyboard, these actions are
converted into events. The UI
components listen to these
events and react to them.
Being notified via an event is almost always preferable to polling, especially if hardware is involved and that event originates from a driver issuing a CPU interrupt. In that case, you're not using ANY cpu at all while you wait for some piece of hardware to complete a task.

Resources