Coalescing GCD file system events - macos

I have a class that implements a file-monitoring service to detect when a file I am interested in has been changed by something other than my application. I use the standard technique of opening the file (with the O_EVTONLY flag) and binding the file descriptor to a Grand Central Dispatch source of type DISPATCH_SOURCE_TYPE_VNODE. When I get an event, I notify my main thread with NSNotificationCenter's postNotificationName:object:userInfo: which calls an observer in my app delegate. So far so good. It works great. But, in general, if the triggering event is an attributes change (i.e. the DISPATCH_VNODE_ATTRIB flag is set on return from dispatch_source_get_data()) then I usually get two closely-spaced events. The behaviour is easily exhibited if I touch(1) the object I am monitoring. I hypothesise this is due to the file's mtime and atime being set non-atomically although I can't verify this. This can lead to spurious notifications being sent to my observer and this raises the possibility of race conditions etc.
What is the best way of dealing with this? I thought of storing a timestamp for the last event received and only sending a notification if the current event is later than this timestamp by some amount (a few tens of milliseconds?) Does this sound like a reasonable solution?

You can't ever escape the "race condition" in this situation, because the notification of your GCD event source in your process is not synchronous with the other process's modification of the underlying file. So, no matter what, you must always be tolerant of the possibility that the change you're being notified for could already be "gone."
As for coalescing, do whatever makes sense for your app. There are two obvious strategies. You can act immediately on a received event, and then drop subsequent events received in some time window on the floor, or you can delay every event for some time period during which you will drop other events for the same file on the floor. It really just depends on what's more important, acting quickly, or having a higher likelihood of a quiescent state (knowing that you can never be sure things are quiescent.)
The only thing I would add is to suggest that you do all your coalescence before dispatching anything to the main thread. The main thread has things like tracking loops, etc that will make it harder to get time-based coalescing right in certain cases.

Related

Event Sourcing: multiple events vs a single "StatusChanged"

Assuming the common "Order" aggregate, my view of events is that each should be representative of the command that took place. E.g. OrderCreated, OrderePicked, OrderPacked, OrderShipped.
Applying these events in the aggregate changes the status of the order accordingly.
The problem:
I have a projector that lists all orders in the system and their statuses. So it consumes the events, and like with the aggregate "apply" method, it implements the logic that changes the status of the order.
So now the logic exists in two places, which is... not good.
A solution to this is to replace all the above events with a single StatusChanged event that contains a property with the new status.
Pros: both aggregate and projectors just need to handle one event type, and set the status to what's in that event. Zero logic.
Cons: the list of events is now very implicit. Instead of getting a list of WHAT HAPPENED (created, packed, shipped, etc.), we now have a list of the status changes events.
How do you prefer to approach this?
Note: this is not the full list of events. other events contain other properties, so clearly they don't belong to this problem. the problem is with events that don't contain any info, just change the status of an order.
In general it's better to have more finer-grained events, because this preserves context (and means that you don't have to write logic to reconstruct the context in your consumers).
You typically will have at most one projector which is duplicating your aggregate's event handler. If its purpose is actually to duplicate the aggregate's event handler (e.g. update a datastore which facilitates cross-aggregate querying), you may want to look at making that explicit as a means of making the code DRY (e.g. function-as-value, strategy pattern...).
For the other projectors you write (and there will be many as you go down the CQRS/ES road), you're going to be ignoring events that aren't interesting to that projection and/or doing radically different things in response to the events you don't ignore. If you go down the road of coarse events (CRUD being about the limit of coarseness: a StatusChanged event is just the "U" in CRUD), you're setting yourself up for either:
duplicating the aggregate's event handling/reconstruction in the projector
carrying oldState and newState in the event (viz. just saying StatusChanged { newState } isn't sufficient)
before you can determine what changed and the code for determining whether a change is interesting will probably be duplicated and more complex than the aggregate's event-handling code.
The coarser the events, the greater the likelihood of eventually having more duplication, less understandability, and worse performance (or higher infrastructure spend).
So now the logic exists in two places, which is... not good.
Not necessarily a problem. If the logic is static, then it really doesn't matter very much. If the logic is changing, but you can coordinate the change (ex: both places are part of the same deployment bundle), then its fine.
Sometimes this means introducing an extra layer of separation between your "projectors" and the consumers - ex: something that is tightly coupled to the aggregate watching the events, and copying status changes to some (logical) cache where other processes can read the information. Thus, you preserve the autonomy of your component without compromising your event stream.
Another possibility to consider is that we're allowed to produce more than one event from a command - so you could have both an OrderPicked event and a StatusChanged event, and then use your favorite filtering method for subscribers only interested in status changes.
In effect, we've got two different sets of information to track to remember later - inputs (information in the command, information copied from local caches), and also things we have calculated from those inputs, prior state, and the business policies that are now in effect.
So it may make sense to separate those expressions of information anyway.
If event sourcing is a good approach for the problems you are solving, then you are probably working on problems that are pretty important to the business, where specialization matters (otherwise, licensing an off the shelf product and creating adapters would be more cost effective). In which case, you should probably be expecting to invest in thinking deeply about the different trade offs you need to make, rather than hoping for a one-size-fits-all solution.

What is the Proper Non-blocking Algorithm for Modbus Master on Microcontroller

Modbus is a a request and response type serial communication. Basically the master send out a request and one of the slave response.
I am modifing the code on a microcontroller which is a master unit on a modbus network. This unit also has a small dot-matrix LCD and some buttons for user interface. The microcontroller is running at 16MHz.
The problem is after the master unit send out a request, it does not know when the slave response, so it may need to wait for a relatively long time. However as this unit has buttons and LCD, it can not wait at a point for too long because the user will feel lag when he pressed a button. The original code is using a RTOS. It seperate the user interface task and the serial communication tasks so it has no problem. Now I need to change it to non-RTOS code. I have implemented a system tick timer which will interrupt at each 1ms. What is the proper (or common) way to do that?
It is possible to do quite a lot with just a single task, especially if you have interrupts. The intermediate position between a single very simple task and an RTOS is a cyclic executive. See http://www3.nd.edu/~cpoellab/teaching/cse40463/slides10.pdf for a brief overview of the spectrum of functionality from a cyclic executive up to a fully preemptive multitasking operating system. You will find much more if you search on this phrase and related phrases, including very sophisticated schemes for making sure that the system never misses its deadlines. If you are an aircraft flight control system, forgetting to check the aircraft pitch angle every X ms can cause problems elsewhere :-)
One way to rewrite code which is naturally multi-threaded is to maintain a model of the state of the system, such as a collection of objects each representing a modbus connection, indexed by a connection id. Then write a routine for every sort of event that can happen, including the arrival of a clock interrupt. When that event happens these routines typically work out which connection is involved, retrieve it from the main collection (or create it from scratch and enter it there if necessary) do the work associated with that particular sort of event, and then return.
It is often convenient to keep a queue of future events, indexed by time, and to have a routine that creates an object representing something to be done at some future time (such as calling a method to check for the expiration of a timeout) and puts this object on the queue.
You need to worry about interrupt processing getting called halfway through an event service routine. One way to deal with this is to lock out interrupts when that could cause a problem. Another way is to have the interrupt routine do nothing more than put an object on a queue that something else will check for later, or just set a flag. Then you need only lock out interrupts when you are checking for items on the queue and removing them.
A number of communications protocols are implemented in this way. Even in a true multitasking operating system you very often don't want to have to create a new thread every time you need to create a new connection. The two main problems with this is that the code is less clear than code which has a thread per object, because stuff that naturally goes together is chopped up into loads of event service events, and if any of the event service methods burn significant amounts of cpu, the system will stall because nothing else will happen when this is going on.

Events changing state in CQRS

This should be easy to follow, but after some reading I still can find an answer.
So, say that the user needs to change his mobile number, to accomplished that, we might have a command as: ChangedUserMobileNumber
holding the new number. The domain responsible for handling the command will perform the change in the aggregate and publish an event: UserMobilePhoneChanged
There is a subscriber for that event in another domain, which also holds the user mobile number in its aggregate but according to our software architect, events can not old any data so what we end up is rather stupid to say the least:
The Domain 1, receives the command to update the mobile number, the number is updated and one event is published, also, because the event cannot hold data, the command handler in the Domain 1 issues yet another command which is sent to Domain 2. The subscriber of that event lives in Domain 2 too, we then have a Saga to handle both the event and the command.
In terms of implementation we are using NServiceBus, so we have this saga to handle these message and in it we have this line of code, where the entity.IsMobilePhoneUpdated field stored in a saga entity is changed when the event is handeled.
bool isReady = (entity.IsMobilePhoneUpdated && entity.MobilePhoneNumber != null);
Effectively the Saga is started by both the command and the event raised in the Domain 1, and until this condition is met, the saga is kept alive.
If it was up to me, I would be sending the mobile number in the event itself, I just want to get a few other opinions on this.
Thanks
I'm not sure how a UserMobilePhoneChanged event could be useful in any way unless it contained the new phone number. User asks to change a number, the event shoots out that it has. Should be very simple indeed. Why does your architect say that events shouldn't contain any information?
In the first event based system i've designed events also had no data. I also did enforce that rule. At the time that sounded like a clever decision. After a while i realised that it was dumb, and i was making a lot of workarounds because of it. Also this caused a lot of querying form the event subscribers, even for trivial data. I had no problem changing this "rule" after i realised i'm doing it wrong.
Events should have all the data required to make them meaningful. Also they should only have the data that makes sense for that event. ( No point in having the user address in a ChangePhoneNumber message )
If your architect imposes such a restriction, it's not going to be easy to develop a CQRS system. How are the read models updated? Since the events have no data then you either query something to get the data ( the write side ? ) of find some way of sending a command to the read model ( then what's the point of publishing events? ). To fix your problem you should try to have a professional discussion with this architect, preferably including other tech heads and without offending anybody try to get him to relax this constraint.
On argument you could use is Event Sourcing. Event Sourcing is complementary to CQRS and would not make sense without events that have data. Even more when using event sourcing, the only data you have is the data stored in the events. Even if you don't actually implement event sourcing you can use it's existence as a reason for events to have data.
There is little point in finding a technical solution to a people problem.

When to use events?

At work, we have a huge framework and use events to send data from one part of it to another. I recently started a personal project and I often think to use events to control the interactions of my objects.
For example, I have a Mixer class that play sound effects and I initially thought I should receive events to play a sound effect. Then I decided to only make my class static and call
Mixer.playSfx(SoundEffect)
in my classes. I have a ton of examples like this one where I initially think of an implementation with events and then change my mind, saying to myself it is too complex for nothing.
So when should I use events in a project? In which occasions events have a serious advantage over others techniques?
You generally use events to notify subscribers about some action or state change that occurred on the object. By using an event, you let different subscribers react differently, and by decoupling the subscriber (and its logic) from the event generator, the object becomes reusable.
In your Mixer example, I'd have events signal the start and end of playing of the sound effect. If I were to use this in a desktop application, I could use those events to enable/disable controls in the UI.
The difference between Calling a subroutine and raising events has to do with: Specification, Election, Cardinality and ultimately, which side, the initiator or the receiver has Control.
With Calls, the initiator elects to call the receiving routine, and the initiator specifies the receiver. And this leads to many-to-one cardinality, as many callers may elect to call the same subroutine.
With Events on the other hand, the initiator raises an event that will be received by those routines that have elected to receive that event. The receiver specifies what events it will receive from what initiators. This then leads to one-to-many cardinality as one event source can have many receivers.
So the decision as to Calls or Events, mostly has to do with whether the initiator determines the receiver is or the receiver determines the initiator.
Its a tradeoff between simplicity and re-usability. Lets take an metaphor of "Sending the email" process:
If you know the recipients and they are finite in number that you can always determine, its as simple as putting them in "To" list and hitting the send button. Its simple as thats what we use most of the time. This is calling the function directly.
However, in case of mailing list, you don't know in advance that how many users are going to subscribe to your email. In that case, you create a mailing list program where the users can subscribe to and the email goes automatically to all the subscribed users. This is event modeling.
Now, even though, in both above option, emails are sent to users, you are a better judge of when to send email directly and when to use the mailing list program. Apply the same judgement, hope that you would get your answer :)
Cheers,
Ajit.
I have been working with a huge code base at my previous work place and have seen, that using events can increase the complexity quite a lot and often unnecessarily.
I had often to reverse engineer existing code in order to fix it or to extend it.
In both cases, it is a lot easier to understand what is going on, when you can simply read a list of function calls instead of just seeing the raise of an event.
The event forces you to look for usages in order to fully understand what is happening. Not a problem with modern IDEs, but if you then encounter many functions, which also raise events, it quickly becomes complex. I had encountered cases, where it mattered in what order functions did subscribe to an event, even though most languages don't even gurantee a calling order...
There are cases when it is a really good idea to use events. But before you start eventing, consider the alternative. It is probably easier to read and mantain.
A Classic example for the use of events is a UI framework, which provides elements like buttons etc.
You want the function "ButtonPressed()" of the framework to call some of your functions, so that you can react to the user action.
The alternative to an event that you can subscribe to, would for example be a public bool "buttonPressed", which the UI framework exposes
and which you can regurlary check for beeing true or false. This is of course very ineffecient, when there are hundreds of UI elements.

I Need an Analogy: Triggers and Events

For another question, I'm running into a misconception that seems to arise here at SO occasionally. Some questioners seem to think that Triggers are to Databases as Events are to OOP.
Does anyone have a good analogy to explain why this is a flawed comparison, and the consequences of misapplying it?
EDIT:
Bill K. has hit it correctly, but maybe doesn't see the importance of the critical differeence between the event and the callback function that strikes me, anyway. Triggers actually cause code to execute every time the event occurs; callbacks only occur whenever one has been registered for an event (which is not true for the vast majority of events); and even then, in most cases the callback's first action is to deregister itself (or at least the callback contains a qualifcation exit so it only executes once.)
If you write a trigger, it will unfailingly execute every time the event occurs, because there's no way to register or deregister to code segment.
Triggers are a way to interpose repeating logic synchronously into the thread of execution (i.e. synchronicity). Events are a means to defer logic until later (i.e. implement asynchronicity).
There are exceptions and mitigations in both cases, but the basic patterns of triggers and callbacks are mostly opposite in intention and implementation. Often the distinction doesn't seem to have fully sunk in. (IMHO, YMMV). :D
They're not the same thing, but they're not unrelated.
In both cases, the mechanism can be described approximately as follows:
Some block of code declares "interest" for changes in state.
Your application affects some change.
The system runs the block of code in response to the change.
Perhaps a database trigger is more like a callback function that has registered interest in a specific event.
Here's an analogy: the event is a rubber ball that you throw. The trigger is a dog that chases after a thrown ball.
If there's some other difference that you have in mind that makes it "dangerous" (note: OP has edited this choice of word out of the question) to compare triggers and events, you can describe what you mean.
Triggers are a way to interpose
repeating logic synchronously into the
thread of execution (i.e.
synchronicity). Events are a means to
defer logic until later (i.e.
implement asynchronicity).
Okay, I see what you mean more clearly. But I think it's in some ways subject to the implementation. I wouldn't assume an event handler has to deregister itself; it depends on the system you're using. A UNIX signal handler, for example, has to prevent itself from catching a new signal while it's already handling one. But a Java servlet inside a Tomcat container should be thread-safe because it may be called concurrently by multiple threads. They're both event handlers, of different kinds.
Event handlers may be synchronous or asynchronous. Can a handler in a publish/subscribe system read messages that were posted recently, but prior to the handler registering its interest? Or only messages posted concurrently?
There's another important reason to treat triggers as different from event handlers: I frequently recommend against doing anything in a trigger that affects state outside the database.
For example, sending an email, writing to a file, posting to a web service, or forking a process is inappropriate inside a trigger. If for no other reason than the transaction that spawned the trigger may be rolled back, but you can't roll back those external effects. You may not even be using explicit transactions, but say you send an email in a BEFORE trigger, but the operation fails because of a NOT NULL constraint or something.
Instead, all such work should be done by code in one's application, after one has confirmed that the SQL operation was successful and the transaction committed.
It's too bad that people keep trying to do inappropriate work inside a trigger. There are senior developers at MySQL who promote UDFs to read and write data in memcached. Wow -- I just noticed these have made it into the MySQL 6.0 product!! Shocking!
So here's another attempt at an analogy, comparing triggers and events to the process of a criminal trial:
A BEFORE trigger is an allegation.
An AFTER trigger is an indictment.
COMMIT is a conviction after a guilty verdict.
ROLLBACK is an acquittal after an innocent verdict.
You only want to put the perpetrator in prison after they are convicted.
Whereas an EVENT is the crime itself.

Resources