Distinguishing Legitimate Anomalies From True Anomalies - anomaly-detection

In order to detect anomalous user commands, I am using a One Class SVM (OCSVM). First, I build a OCSVM model over normal commands of the user. Then, upon receiving a new command, the model either output +1 for normal commands, and -1 for abnormal ones.
However, after a while, the user may change his behavior. Consequently, new legitimate commands will be considered as abnormal (i.e., false alarms) as they do not conform to previous user behavior.
How can my OCSVM model autonomously adapt to legitimate user behavior change?
More specifically, How can I distinguish between a true anomaly from a false anomaly ? since I am not sure for a new command, considered as abnormal, if it is really abnormal or it belongs to the legitimate user changing his behavior.
I am finding many related concepts such as Incremental and Online Learning, but I think those can be used to retrain the model only when we are sure about the label of a new instance (a new command in my case).

Related

Eventual consistency - Axon conflict resolver

I'm working on a PoC to evaluate the use of Axon framework for the development of a new application.
My concern is about the eventual consistency with the CQRS pattern since consistency is a requirement for us.
There are a lot of articles and threads about this topic, so I apologize if I'm creating a duplicate thread.
Axon offer a conflict resolver but I'm not sure to understand how it works.
I found an example on a open source project.
This solution stores the version of the aggregate in the event store and read model. The client will read then the version from the read model.
What if I have different read models, could there be version conflicts?
How does Axon solve the conflicts?
Thanks
Before we dive into how Axon deals with consistency, there are a few things that I'd like to point out in the context of CQRS as a concept.
There is a lot of misconception around consistency in combination with CQRS. The concept of eventual consistency applies between the different models that you have defined within your application. For example, a Command Model may have changed state recently, but the Query Model doesn't reflect that state yet. The Query Model is eventually consistent with the Command Model. However, the information within that Query Model is still consistent in itself.
More importantly, this allows you to make conscious choices around where consistency is important and where it can be relaxed. Typically, Command Models make decisions in which consistency is important. You'd want to make sure each decision is made with the relevant knowledge of recent changes. That's the purpose of the Aggregate. An Aggregate will always make decisions that are consistent with its state.
I recommend reading up on the Reactive Principles document [1], namely Section V [2].
Then Axon. Axon implements the concepts of DDD and CQRS very strictly. Consistency is sacred within an Aggregate. For example, when using Event Sourcing, the events with an Aggregate's stream are guaranteed to have been generated based on a State that included all previous events in that stream. In other words, event number 9 in the stream was created with the knowledge of events number 0 through 8. Guaranteed.
When events are published, this doesn't mean any projections are already up to date. This may take a few milliseconds. Relaxing consistency here allows us to scale our system. The only downside is that a user may execute a command, perform a query and not see the results yet. This is actually much more common in systems than you think. There are numerous ways to prevent this from being a problem. Updating user interfaces in real-time is a powerful way of working with this. Then it doesn't matter which user made the change; they see it practically immediately.
The other way round may pose a challenge. A user observes the system state through a Query. This may (and always will, even without CQRS) provide stale data; the data may have been altered while the user is watching it. The user decides to make a change. However, in parallel, the information has already been changed. This other change may be such that, had the user known, it would have never submitted that Command.
In Axon, you can use Conflict Resolvers to detect these "unseen" parallel actions. You can use the "aggregate sequence" from incoming events and store them with your projection. If a user action results in a Command towards that aggregate, pass the Aggregate Sequence as Expected Aggregate Version. If the actual Aggregate's version doesn't match this (because it has been altered in the meantime), you get to decide whether that is problematic. There is a short explanation in the Reference Guide [3].
I hope this sheds some light on consistency in the context of CQRS and Axon.
[1] https://principles.reactive.foundation
[2] https://principles.reactive.foundation/principles/tailor-consistency.html
[3] https://docs.axoniq.io/reference-guide/axon-framework/axon-framework-commands/modeling/conflict-resolution

Why there is no include link between create and delete use cases

I have seen on the internet many examples of use cases diagrams (in UML) as this one:
What I see is that the delete use case does not include the create use case. Even though I can't imagine deleting a user without creating it.
I wonder why it is still right to not use the include ? And I wonder when should I use it and when to not use it ?
If there is Delete-User - - <<include>> - -> Create-User that means during the execution of the UC Delete-User the UC Create-User is also executed, and of course that has no sense.
The expected behavior can be :
Delete-User has the prerequisite Create-User was successfully executed for the same user and Delete-User was not already executed successfully for the same user (after the last Create-User then)
or Delete-User can be executed without prerequisite but if the user does not exist (Create-User was not executed successfully for the same user, or Delete-User was already executed for the same user after the last creation or the user) this is an error case
Bruno's excellent answer already explains why it's not a good idea to include Create into Delete, and what alternatives may be used to express the relation that you explained between the two use-cases.
But in case it helps, here another angle:
A use-case diagram does not represent a logical sequence of activities.
A use-case only represents a goal for an actor that motivates his/her interaction with the system independently of the other use-cases and the system's history. So, the simple fact that a sysAdmin may want at a moment in time to delete a User is sufficient for the use-case Delete to exist on its own.
include shows that a goal may include some other goals of interest for the user. Inclusion is not for functional decomposition where you'd break down what needs to be done in all the details. It's not either to show the sequential dependency. So for Delete, you shall not include what happens before, because happens-before is sequentiality. Inclusion only highlight some relevant sub-goals that are meaningful for the user and that the user always want to achieve when aiming at the larger goal.
finally, a use-case Delete may have perfect sense even if the use-case Create was never performed by any actor, for example because:
the new system took over a legacy database with all its past account, and the first thing that the SysAdmin will do in the new system, will be to clean-up the already existing old unused accounts before creating new ones.
the SysAdmin wants to Delete an account but finds out only during the interaction that the account didn't exist, was misspelled, or was already deleted. These possibilities are all be alternate flows that you would describe in the narrative of the the same use-case.
or if not Create use-case would be foreseen, because the user creation would be done automatically in the background (e.g. based on an SSO), without the actors being involved at all.

Events and Commands difference and naming conventions

I have been having some difficulties differenciation the two recently. More specificly I have browsed stackoverflow and there is a statement that Events can be named in two different ways:
with "ing" or with past tense "ed". This can be seen here Events - naming convention and style
https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/names-of-type-members
At the same time the CQRS states that names need to be in past tense and then following their guidelines the events named above with "ing" form would be commands. This gets me a bit confused? Do events mean different things in depending on the architectural context and movement. Is there a unified view on what an event and command is ?
CQRS you've been reading only had past-tense event names likely because they didn't consider pre-events. A command commands something to happen, and as such is typically formed in imperative ("click!", "fire!", "tickle!"). It makes no sense for a command to be a gerund ("clicking! clicking faster, you! or I fires you!") As it precipitates an action, it will likely trigger one or more notifications (=events) that something of note is about to happen, and afterwards that something of note did happened.
-ing events (e.g. ("Clicking") happen before the event is handled, e.g. in case someone wants to stop it. Sometimes, they are called "before" events (e.g. "BeforeClick"), or "will" events ("WillClick").
-ed events (e.g. "Clicked") happen after the event is handled, e.g. in order to affect dependents. Sometimes, they are called "after" events (e.g. "AfterClick") or "did" events ("DidClick").
Which specific scheme you follow does not really matter, as long as you (and your team, and your potential partners) are consistent about it. Since CQRS (under that name) is largely a Microsoft thing, follow what Microsoft says. Should you code for Mac, the concepts are similar - but you'd do well to go with the Apple guidelines instead.
Is there a unified view on what an event and command is ?
Unified? No, probably not. But if you want an authoritative definition, Gregor Hohpe's Enterprise Integration Patterns is a good place to start.
Command Message
Event Message
Within the context of CQRS, you should consider Greg Young's opinion to be authoritative. He is quite clear that command messages should use imperative spellings, where events use spellings of changes that completed in the past.
Names of Commands and Events should be understood to be spelling conventions -- much in the same way that the spellings of URI, or variable names, are conventions. It doesn't matter at all for correctness, and the computer is for the most part not looking at the spelling (for example, we route messages based on the message name, not by looking at the verb tense).
Events describe a change to the state of a model; all events are ModelChanged. However, we prefer to use domain specific spellings for the type of the event, so that they can be more easily discriminated: MouseClicked, ConnectionClosed, FundsTransfered, and so on.
Use of the present progressive tense spelling for an event name is weird, in so far as the message is a description of the domain model at the point of a transaction, where the present tense semantically extends past that temporal point. More loosely, present progressive describes a current state, rather than a past change of state.
That said, finding a good past tense spelling for pre-events can be hard, and ultimately it is just a spelling convention; the work required to find an accurate spelling that is consistent with the past tense convention may not pay for itself compared to taking a natural but incorrect verb tense.

Independent subtates in a state machine

I need some suggestions. I am trying to implement an online order process through Spring state machine and am trying to construct a state diagram before I get to work. Now say my order can be canceled by three different admin users CanceledByAdmin1,CanceledByAdmin2 and CanceledByAdmin3. Should I make them substate of Cancel state or create three different states? Keeping in mind that all canceled states are the final states and independent of each other, I don't know if making substates does anything other than simplifying the paper diagram. Any help would be appreciated.
What comes for Spring Statemachine we can have only one terminate state and trying to make that as collection of substates is a bit awkward because once you enter it, state machine should stop all processing. Thought this area is something what I've probably overlooked and could try to enhance things.
While you could probably have a state S1 having three substates S11/S11E,S12/S12E andS13/S13E with triggerless transition from S11 to S11E and same with other substates, even this feels a bit weird because none of those would actually terminate root state machine.
I guess question is what you're trying to accomplish?
If you only want to keep that information around who/which cancelled the order, could you use a simple single terminate state and during a transition to that terminate state, add/modify extended state variables with this info.
Extended state variables are usually used to overcome these problems of suddenly having astronomical count of states to keep arbitrary information around. I know that in this example you only have three, but what about if you have 10, or 100? If you actually need to add even one more, you need to change state machine configuration and recompile. With extended state variables you would not need to do that.

Data Structures to store complex option relationships in a GUI

I'm not generally a GUI programmer but as luck has it, I'm stuck building a GUI for a project. The language is java although my question is general.
My question is this:
I have a GUI with many enabled/disabled options, check boxes.
The relationship between what options are currently selected and what option are allowed to be selected is rather complex. It can't be modeled as a simple decision tree. That is options selected farther down the decision tree can impose restrictions on options further up the tree and the user should not be required to "work his way down" from top level options.
I've implemented this in a very poor way, it works but there are tons of places that roughly look like:
if (checkboxA.isEnabled() && checkboxB.isSelected())
{
//enable/disable a bunch of checkboxs
//select/unselect a bunch of checkboxs
}
This is far from ideal, the initial set of options specified was very simple, but as most projects seem to work out, additional options where added and the definition of what configuration of options allowed continually grew to the point that the code, while functional, is a mess and time didn't allow for fixing it properly till now.
I fully expect more options/changes in the next phase of the project and fully expect the change requests to be a fluid process. I want to rebuild this code to be more maintainable and most importantly, easy to change.
I could model the options in a many dimensional array, but i cringe at the ease of making changes and the nondescript nature of the array indexes.
Is there a data structure that the GUI programmers out there would recommend for dealing with a situation like this? I assume this is a problem thats been solved elegantly before.
Thank you for your responses.
The important savings of code and sanity you're looking for here are declarative approach and DRY (Don't Repeat Yourself).
[Example for the following: let's say it's illegal to enable all 3 of checkboxes A, B and C together.]
Bryan Batchelder gave the first step of tidying it up: write a single rule for validity of each checkbox:
if(B.isSelected() && C.isSelected()) {
A.forceOff(); // making methods up - disable & unselected
} else {
A.enable();
}
// similar rules for B and C...
// similar code for other relationships...
and re-evaluate it anytime anything changes. This is better than scattering changes to A's state among many places (when B changes, when C changes).
But we still have duplication: the single conceptual rule for which combinations of A,B,C are legal was broken down into 3 rules for when you can allow free changes of A, B, and C. Ideally you'd write only this:
bool validate() {
if(A.isSelected() && B.isSelected() && C.isSelected()) {
return false;
}
// other relationships...
}
and have all checkbox enabling / forcing deduced from that automatically!
Can you do that from a single validate() rule? I think you can! You simulate possible changes - would validate() return true if A is on? off? If both are possible, leave A enabled; if only one state of A is possible, disable it and force its value; if none are possible - the current situation itself is illegal. Repeat the above simulation for A = other checkboxes...
Something inside me is itching to require here a simulation over all possible combinations of changes. Think of situations like "A should not disable B yet, because while illegal currently with C on, enabling B would force C off, and with that B is legal"... The problem is that down that road lies complete madness and unpredictable UI behaviour. I believe simulating only changes of one widget at a time relative to current state is the Right Thing to do, but I'm too lazy to prove it now. So take this approach with a grain of scepticism.
I should also say that all this sounds at best confusing for the user! Sevaral random ideas that might(?) lead you to more usable GUI designs (or at least mitigate the pain):
Use GUI structure where possible!
Group widgets that depend on a common condition.
Use radio buttons over checkboxes and dropdown selections where possible.
Radio buttons can be disabled individually, which makes for better feedback.
Use radio buttons to flatten combinations: instead of checkboxes "A" and "B" that can't be on at once, offer "A"/"B"/"none" radio buttons.
List compatibility constraints in GUI / tooltips!
Auto-generate tooltips for disabled widgets, explaining which rule forced them?
This one is actually easy to do.
Consider allowing contradictions but listing the violated rules in a status area, requiring the user to resolve before he can press OK.
Implement undo (& redo?), so that causing widgets to be disabled is non-destructive?
Remember the user-assigned state of checkboxes when you disable them, restore when they become enabled? [But beware of changing things without the user noticing!]
I've had to work on similar GUIs and ran into the same problem. I never did find a good data structure so I'll be watching other answers to this question with interest. It gets especially tricky when you are dealing with several different types of controls (combo boxes, list views, checkboxes, panels, etc.). What I did find helpful is to use descriptive names for your controls so that it's very clear when your looking at the code what each control does or is for. Also, organization of the code so that controls that affect each other are grouped together. When making udpates, don't just tack on some code at the bottom of the function that affects something else that's dealt with earlier in the function. Take the time to put the new code in the right spot.
Typically for really complex logic issues like this I would go with an inference based rules engine and simply state my rules and let it sort it out.
This is typically much better than trying to 1. code the maze of if then logic you have; and 2. modifying that maze later when business rules change.
One to check out for java is: JBoss Drools
I would think of it similarly to how validation rules typically work.
Create a rule (method/function/code block/lambda/whatever) that describes what criteria must be satisfied for a particular control to be enabled/disabled. Then when any change is made, you execute each method so that each control can respond to the changed state.
I agree, in part, with Bryan Batchelder's suggestion. My initial response was going to be something a long the lines of a rule system which is triggered every time a checkbox is altered.
Firstly, when a check box is checked, it validates (and allows or disallows the check) based on its own set of conditions. If allowed, it to propagate a change event.
Secondly, as a result of the event, every other checkbox now has to re-validate itself based on its own rules, considering that the global state has now changed.
On the assumption that each checkbox is going to execute an action based on the change in state (stay the same, toggle my checked status, toggle my enabled status), I think it'd be plausible to write an operation for each checkbox (how you associate them is up to you) which either has these values hardcoded or, and what I'd probably do, have them read in from an XML file.
To clear this up, what I ended up doing was a combination of provided options.
For my application the available open source rule engines were simply massive over kill and not worth the bloat given this application is doing real time signal processing. I did like the general idea.
I also didn't like the idea of having validation code in various places, I wanted a single location to make changes.
So what I did was build a simple rule/validation system.
The input is an array of strings, for me this is hard coded but you could read this from file if you wish.
Each string defines a valid configuration for all the check boxes.
On program launch I build a list of check box configurations, 1 per allowed configuration, that stores the selected state and enabled state for each checkbox. This way each change made to the UI just requires a look up of the proper configuration.
The configuration is calculated by comparing the current UI configuration to other allowed UI configurations to figure out which options should be allowed. The code is very much akin to calculating if a move is allowed in a board game.

Resources